Defines an extractor for various type of content from PDF pages.

Full documentation

Full documentation

Initializes a new PDFContentExtractor object.

Full documentation

Initializes a new PDFContentExtractor object.

Full documentation

Extracts the colorspaces that exist in the page resources.

Full documentation

Extracts the page content stream as a list of graphic operators with their operands.

Full documentation

Extracts the information related to the images displayed on the page.

Full documentation

Extracts the content of an optional content group.

Full documentation

Extracts the text from the PDF page.

Full documentation

Extracts the text from the PDF page.

Full documentation

Extracts the text from the PDF page as a collection of Objects.

Full documentation

Extracts the text from the PDF page as a collection of Objects.

Full documentation

Extracts the text fragments from the PDF page.

Full documentation

Extracts the text fragments from the PDF page.

Full documentation

Extracts the page content as a list of visual Objects.

Full documentation

Extracts the page content as a list of visual Objects.

Full documentation

Extracts the page content as a list of visual Objects.

Full documentation

Extracts the text from the PDF page as a collection of words.

Full documentation

Extracts the text from the PDF page as a collection of words.

Full documentation

Gets the cmap factory.

Full documentation

Searches the page content for the specified text.

Full documentation

Searches the page content for the specified text.

Full documentation

Searches the page content for the specified text.

Full documentation

Sets the cmap factory.

Full documentation